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ABSTRACT 

This Study tests several eKplanations for discrapam: 
results in an earlier study (Cook at al*, liSS) which presented a 
partial pre-calibration method for equating new tditions of the 
Scholaitic Aptitude Test (SAT) to the same seale as older editions. 
In contrast to full pre-calibration, which saeks to equate all items 
from two or more editions, the partial method designates a subfet ^£ 
items to serve as an equating section; both methods raly on iteni 
response theory (IRT) and employ LOGIST to aahieye calibration. 
Several verbal and mathematics equmtings selected from the larger 
study were subject id to further tests. Estimation error of item 
paramiters was small; instead^ the source of difficulty raited witlte 
the IRT characteristic curve transformations, which failed to 
adequately equate the series of calibrations runs to the same gcalss^ * 
PossiW© differences in ability levels at different test 
administrations could not account for the di^erepint equatings. 
Finally, the study identified items that fiunationad differently (Dili' 
items) in the two groups used for equating and calibration « 
Eliminating these items from the partial pre— calibrations runs, 
howevir, did not improve the equatings significantly. Figures depict 
the calibration designs, item response functions, and other results . 
(LPG) 



* Hiproductions supplied by EDRS are the bast that can be made * 

* from the original dociunent« * 



EKLC 



r 

CO 



Characteristics of Samples and Linking Items A£f meeting 
a Partial Pre-callbration Design"*-'^ 



Linda L* Cook 
Danial R, Elgnor 
Marilyn Wlngeralqr 

Edueational Tasting iervice 



U,S. DiPARTMEr^'T OF feOUCATlON 
Officfi et idueaiien&! Reasfifsh and Ifftpfovifflen! 

EOUCATlQlSiAL RilOUnCES INFORMATION 

CENTER (GRIC) 
^This desumsnt has been reproduced ss 

received ffsm the persen or organizatiGri 

Priginating it. 
□ Minpr ehanges have been rnade !o jmpfpyg 

fepfSdyptlur^ quality. 



points b! view pr ppiniehs stated in this decu^ 
men! dp not necesssrijy repfesen! ptfieial 
0|R1 pQSitipn pr ^iiey. 



"PERMISSION TO Rf PRODUCE THIS 
MATERjAL HAS BEEN GRANTED SY 



TO THE EDUCATIONAL RESOURCIS 
INFORMATION CENTER (iRlG)/* 



A papar p?fagsntad at tha amiUEl maatlng of AEBA, Washlngtenp 1987, 

2 

This projaat was gupportad by the Coll#ga Board through Amissions Tasting 
Program funding. 

3 

Tha authors would lika to acknowladga the assistance or advice of 
^ Naney Patersen, Prad McHala, Nancy Wright, Karen Carroll, and Tad Blew in 

completing this project* 

Q 

^ 2 BEST COPY AVAILABU 



ERIC 



Gharaeterlatics of Samples and Linking Items Affecting 
a Partial Pre -calibration Design 

Linda L. Cook 
Daniel R. Eignor 
Marilyn S. Wingersky 

Edueational Testing Service 
INTRODUCTION 

Since 1982, lET equating of new editions of the Scholastic Aptitude Test 
(SAT) has been based, except for a ^mall number of instances, on three -parameter 
logistic model Item parMeter estimates (see Lord, 1980) obtained from the 
concurrent calibration of Items from the new edition, two equating tests, and two 
old editions of the test, using data from two samples taking the new edition of 
the test and a sample from each group taking the old editions of the test (see 
Figure 1). In a concurrent calibration design, item parameters for the three 
total tests and the two equating tests are estimated and placed on a common scale 
in a single calibration run. The computer program LOGIST (Wingersky, Barton, and 
Lord, 1982; Wingersky, 1983) has been used to perform the item calibration 
needed. Seotes on the new edition are then equated to scores on each of the 
earlier editions, using IRT true-score equating (Lord, 1980) and the results 
averaged. This type of IRT equating uses exactly the same data colleetlon design 
that was used for the traditional non-IRT equating of SAT- verbal and 
SAT-mathematical done prior to 1982. The calibration design is based on the SAT 
braiding plan (Angoff , 1974) and is considerably limited in its fleKlblllty, 
Scores on the new test edition can only be equated to scores on old editions that 
were administered with the same equating sections as those given with the new 
edition, A more flexible equating procedure would take advantage of item 
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parameter •stimates from te»t editions given at a number of different S4T 
administrations that are on one common scale. 

Insert Figure 1 about here 

The most flexible calibration design that could be used with the SAT would 
be a full pre-callbration design, which would lead to pre-equatlng of the verbal 
and mathematical sections. Pre-equatlng refers to the process of establishing a 
conversion from raw to scaled scores prior to the time the new test edition Is 
administered operationally. The process depends on the adequate pretesting of a 
pool of items from which the new test edition will be assembled, the calibration 
of these items using IRT methods, and the utilization of a linking scheme to 
plaea the IRT item parameter estimates on a common scale. The last step is, 
perhaps, the most critical step. Unlike the concurrent calibration design, where 
the necessary Item parameter estimates are automatically on the same scale 
because there Is only one calibration run, for the pre- calibration design, thexe 
will be multiple calibration (LOOlST) runs and the parameter estimates will 
initially be on the unique scales defined by the ability distributions of the 
samples used In the separate LOOIST runs (see Cook and Eignor, 1983). it Is 
possible, however, if the ILJ model fits the data, and there are common Items 
between calibration runs, to determine a linear relationship that can be used to 
transform Item parameter estimates from one calibration run to the scale of the 
parameter estimates from another calibration run. Hence, It is not the existence 
of unique scales that presents a problematic hurdle for IRT pre-equatlng but, 
rather, the need to have sets of common items between calibration runs that would 
ultimately allow placement of aLX pretest parameter estimates for items 
constituting final editions of a test on one common scale. Further, feasibility 
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studies investigating nhm possibility ©f pre-equatlng the SAT (Eignor, 1985; 
Eignor and Steeklng, 1986) have provided results that have basn, for the most 
part, unacceptable. For thasa reasons, a somewhiat lass flexible calibration 
dasign than pra-aquating, but aartainly mora flexible than the eoneurrant 
calibration design prasently used, was saan as worthy of Investigation. This 
design, whieh is called partial pre -calibration, is dascribad first in the 
following paragraphs, and then the results of a faasibillty study (Cook, HcHale, 
Eignor, Petersan, and Dorans, 1985) investigating the possibiliqr of its use are 
deserlbed. The cur^nt invasttgation involves further study of selected results 
from the Cook et al, (1985) study ^ 

The essential feature of a partial pre -calibration design is that the items 
from the equating test have been calibrated and placed on a common scale prior to 
their atainistratlon with a new edttldfi ©f the SAT. (In full pre-callbration, 
all the items in the new adltlon have been calibrated prior to the administration 
and placed on a common scale, not Just an equating section, ) In performing the 
equating, data is collected from the sample who take the new adltion and also the 
equating test, for which IRT parMaters have been previously estimated. Tha 
parameter estimates for the equating items, which are racalibrated with the new 
edition and which already exist on the common scale from a previous calibration, 
p-r-^rlde the link necessary to place new edition item parameter estimates ©n the 
common scale. With the existence of multiple equating sections containing items 
on the common scale comes a degree of flexibility not offered by the concurrent 
calibration dasign. A distinct advantage of the partial pre -calibration dasign 
is that equating sections are interehangeabla ; any equating section with itamy on 
the common scale can be admlnlsterad with the new edition for aquating purposes, 
not Just those equating sections that ware given with the old editions to be used 
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in fh* .ci«at'nf as the case for the concurrent eallbratton design based on 
the b-at^'ns pl«^. (Note that in order for partial pKs-callbratlon to work 

anjr o : •dltioi.fi tk.at might b. used in the equating must also contain items on 
th* ..mmou In addlcion to this greater degree of flexibility, there Is a 

^rnH'^vii^^m cost savings associated with the equating of the new edition in that 
the ".tLons to be used in the equating do not have to be recalibrated with 

thm. liaw idiitlon, as la presently the practice,. 

Once the new edlclon parameter estimates are placed on the common scale , 
multiple equatings to all old editions with item parameters on the common scale 
become possible. In the concurrent design, which uses equating plans laid out in 
the SAT braiding plan, equating to only the two old editions that were 
administered with the common Item material Is possible. The use of multiple old 
editions In the equating process should ultimately Improve upon current equating 
practice and make scores more consistent from, one administration to another. 

As mentioned previously, Cook et al. (1985) conducted a feasibility study 
Investigating the possibility of using a partial pre-callbratlon design to equate 
new editions of the SAT. In their study, Item parameters needed for the 
equatings were either estimated through a number of individual LOOIST ealibration 
runs done specifically for the study or obtained from previous concurrent 
calibration mms performed in the context of operational IRT equating. Each of 
the calibrations, be they concurrent calibrations previously done or calibrations 
done specifically for the study, produced item parameter estimates on a scale 
particular to the calibration run.' Paremster estimates from the separate 
calibrations were then placed on one common scale. Among the equating tests 
calibrated in any given calibration run in tnls study was an equating test that 
^as calibrated in another run. Thus, two sets of parameter estimates existed for 
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these items thit were on dtfJKeEent soales because they resulted from two 

different caiibtatlon runs, Tha characteristic curve transformation procedura 
(Stocking and Lord, 1983) va» usad to place tha Item paramater asttlmatas from the 
"eurrant" ruaon tho scale offi the parameter estimates obtained in the previous 
run. For the study, an eictemslve design was devaloped that permitted placing 
item paremeCeristlmata* for 24 SAT-varbal editions and 24 SAT-mathematleal 
editions oil a won scale <»ne for SAT-verbal, on* for SAT -mathematical) defined 
m November 1982 when a paecL_c;ular adltlon of the SAT, designated 18, was first 
administered, items from eqiaatlng sections that were calibrated along with the 
different edttlons of the Sa"^ were also placed on this scale. Further details 
and pletotflaL tspresentatlons of these calibration designs may be found in the 
Appendix ot feTili paper. 

Oncm calibrations we3e completed and all item parameter estimates were 
plaeed on the 1! base scale, =it was possible to equate the scores from any 
particular eddtlon of the SAT to the scores from any other edition of the SAT 
used in tba study. For purposes of the study, the test to be equated was treated 
as If It had »8VBr been equatsd previously. IRT true-score equating (Lord, 1980) 
was then carriid out to the smme two old editions that were ased when the new 
edition was aquited operatioiim.lly through a concurrent calibration design. 
Hence, an appropriate ctiterl=n existed in all cases against which to compare the 
results of the iitperimencal e^uatlngs; i.e., the operational aquatlngs resulting 
from the conciuitmt callbeati&ns which were used for score reporting. As 
mentioned prex^taly, the studHy -^eslgn Also permitted equating each new edition 
to multiple Cmm Chan two) pl_d edltlonB . However, the maxlnium number of old 
editions usmd fst the equating in the Cook ec al, (198S) study was two. 
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The results ofetatnsd from th^ equatlngs in th« Cook et al. study ware 
somewhat difficult to intifpifet. Some of the •quatlngs agreed very closely with 
the criterion equatlngs baied on t3ie concurrent calibration design and some 
produced quite discrepant tasults. Cook at al. also found It difficult to 
conolude. based on the ttsldual pl«ts and tabulations tthsy prepared, vhather or 
not a partial pre-callbritlon design appeared to be more feasible for SAT-verbal 
than for SAT- mathematical, For bo«li teats, there were a number of equatlngs that 
produced residuals greatar than 20 scaled-score polnta. These results led Cook 
et al. to quectlon very larlously ^lie Implementation of a partial pre- calibration 
design for either SAT-vsrbal or SA^ -mathematical. 

A search for poasltU txplana&lona for why some equatlngs produced smaller 
residuals than others In the Ccok al. study was not particularly fruitful. 

For Instance , whether or not new o= old editions of a test were linked to the 
base scale by a single Maniforraat^on or by several transformations, prior to 
equating, seemed to have little •fffeet. Efforts to evaluate the effect of 
particular equating testi that war^ used more than once also resulted In 
conflicting inforoatton. Cook et ^1, concluded their paper by listing a number 
of- additional factors that eould ha=ve possibly affected their partial 
pre -calibration results- isveral oF thasa factors are investigated in detail in 
this paper. In sunmary. Cook et al - felt that it dir. not seem unreasonable to 
hope that if some of the factors affecting the viability of the partial 
pre- calibration design could be determined and controlled, equatlngs based on 
partial pre -calibration would eventually provide reasonable results. 

PURPOSE 

The purpose of this itudy was teo investigate, for selected equatlngs from 
the Cook at al, (1985) study, factors that could have possibly affected their 
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partial ptfe-caUb^tlon results. fi-Jri^st, a Buries of Item param«t«r 

trans Co^ma Lions an4 equatin|s wet* Ci^att-rl.d ouc in an attt«inpt to detarmlne iS m 

poor ««ulM from tHa previous ita^f 4^te-mre related to ths item parametar 



t^rangfo^E^atleitis or the Item calW^^to^.t-ioni 



1 ^=wo factors that couXd 



e thfe transforation process #;to^i-a •xainlned; 1) ^osslbl* dlffetetices in 



the abimicy Livels of laBples of #!6«amln«es used to c«.librate Che Itenii used 
foir linking pijEposes , a^id 2) th« pwi^lm extstenci of dE.£ferentlal Iceoi 
funcetoalng the link^til Iteaf f^t fhAe sanple* uiid to calibrate the data, thi 
results of cvc racently Wetei acM^LAes (Stockltii and Signor , 1986: CooK, 
Elgnor, and Wlngersfcy, ig-p) suggeac tflumat thasa t«o faeto-rs are viable caiwllaates 
for exp]__aiiilng the poor pjttial p^a-c^l: JLlbratlon rBJUlCa the Cook et al. (19as) 



clstng and Elgnor (1986) pjfOvMsasd simulation resui-ts that have 
impllcat-iona for th* proceM of equaetnsng whan thets are L«rge differences in 
ability T^etween equating aaiplas. 'tH^imsm reaearchera were Interested In what 
effects - -Che ability levels ophe Saittjl»,aa used wuld have on three-paraaietar 
logistic ttodel parameter sst|mates mm suhsequetic equating results. Using LOoIST 
for esclwaClonputposes, thsj foUftd cV^.t dlffarencei in in«an true ability beGweej 
saaples wuged In 8qua.tlng can oftuse dit^te-erencea In tha precision with which 
pacanetewa are aatlmstad, BVi,n wh«(i ti^0 test d»t* fit the particular model used. 



The effect of thii dif ferenti 
substantM,al IC the samples bs 
(I.e., a dlfffetince Ixi 



al pt#^is^^on in astlniatlon ots, test equating caii he 
jln to lia«'a&e fairly iaj;p dlfCerencas in true ability 
aeani,|)£ oi\# ntimiore stand«d devtM.tlons on the abtlicy 
scale). A ttote ditailed expljinacttfti f^ttar why differsnoea Cn mean true abilities 
can caus» differences In the jWecttfton o«f the paranntir astlraatas can be found ■tji 
Stop.king and Elgnor C1986 ; pp- 11-13) , 
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Cook, llinor, and mngirsky (1987) Invastlsated the •ffeets on •quatlng 
results of coiion itsn, Beotlons tThat oontained « few items whose Item response 
functions w«8 not well fit by clt^ three -parameter logistic model and of common 
item geetloni that Goiiea_ined a f e-^ lt«ms on which the two groups taking the 
cofflffion Iteosactlon resp.«ndad differently. Thnlr study was also carried out 
using the thtn-paraniece^ loglstl« item response theory model and Mont. Carlo 
prooedures. So that th* simulated data refleet.d actual test data, the true Item 
paraBeters mu taken fr«m the estimated parameters obtained from LOOIST 
callbratioiu of item reS|ponseg obtained from selefiCed administrations of the 
verbal sectloni of the SAT. Thass effects were investigated using both 'he 
concurtenc eallbratlon tod the characteristic curve transformation linking 
procedures fotvaryins tujaba-s of common Items. The effects of these common item 
sections w«re studied using a unie-orm ability distribution and certain 
charactortstles of Che fmrametsr tostlmates for the common items (i.e., the 
parameter estliatea for ths Items all had small standard errors of estimation) 
selected as irasult of thi flndln^gs of a previous study (Wlngersky, Cook, and 
Elgnor, I98fi), 

Results of the Coole et al. (lr«87) study Indicated that equatings. 
particularly those obtained using « charaoteristlc curve transformation design, 
are serl(.usly iff ected br the prei«nce linking items that function differently 
for the two iwups used provids data for the equating and calibration. This 
Is particularly true fo^ shorter linking tests. They concluded that the quality 
of an equatliii depends, s^mmhai, =n prior screening of linking tests and removal 
of Items that function dl^firencly for the two groups. 
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METHODOLOGY 
Cholec off Equatings to Study 
From the verbal and mathematloal •xperlmental equatings (i.e., equatlngs 
based on a partial pre-nalibratlon daslgn) In the Cook at al. (1985) study, ona 
varbal and one mathamatlcal equating was. chosen for further study. Each of these 
eqiiatlngs actually involved a pair of single equatlngs that were averaged. I.e., 
« pair is an equating of one new test eclitlon to two old adltlons. The specific 
verbal and math equatlngs were chosen because: 1) one equating in the pair that 
were averaged gave excellent results when compared to the operational concurrent 
calibration crlterlcn equating while the second equating In the pair gave 
extremely discrepant results, and 2) the parameter estimates for the new edition 
to be equated In each case came from the operational concurrent calibration run 
which equating results used for score reporting were derived. This allowed 
special analyses, described in detail later In the paper, to be developed, 
analyses were used to explore the calibration runs in an effort to 
determine if the discrepant equatlngs resulted from problems with the estimation 
process or with the item transformation process. 

In Figure 2, the equating relationships among the editions chosen for 
further study are depleted; these relationships are defined in the SAT braiding 
plan C^goff , 1974) . Upper ease letters and numbers designate operational 
editions- lower case letters designate equating sections. The equating sections 
depicted in Figure 2 are those used in the concurrent calibration of the new and 
old editions. 



Insert Figure 2 about here 
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Flgur* 3 contains portions of the SAT-v«rbal and SAT mathematical partial 
pre-callbratlon transformation plans presentad in Cook et al. (198S) and In the 
Appendix of this paper. In each case, the portion depicted contains the specific 
editions being Investigated In the current study. In the Cook et al. (1985) 
study, for both SAT-verbal and mathematical, all parameter •stlmates for the 
editions to be equated were transformed to the scale defined by Edition E8 (run 1 
In Figure 3). using the characteristic curve transformation method (Stocking and 
Lord, 1983), and then the equatlngs were performed. For SAT-verbal, the 17 to C5 
equating, after placing all parameter estimates on the 18 scale, gave Inferior 
results when compared to the criterion equating from the concurrent calibration 
while for SAT-mathematical, the F3 to 13 equating gave Inferior results. Figure 
4 contains residual or difference plots for the four Individual verbal and 
mathematical equatlngs being studied. In each case, raw (formula) score 
differences (partial pre-callbratlon results minus concurrent calibration 
criterion results) are shown for the range of possible raw scores. 

Insert Figures 3 and 4 about here 

Exploration of Calibration Runs 
In an attempt to explore possible explanations for the differences In 
quality of the Individual SAT- verbal and SAT-mathematloal partial pre-callbratlon 
equatlngs under study, additional Item transfonnatlons and 'equatlngs , making use 
of data from the Cook et al. (1985) study, were performed. Most of the equatlngs 
and transformations made use of parameter estimates for the editions involved 
that had already been placed on the base scale (run 1 in Figure 3) as part of the 
previous study (Cook et al. , 1985). 
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Icem Parameter EgtlmaCe Trans formations 

The first set of experimental transformations and aquatings carried out for 
this study were performed in an attempt to assess how well the transforniations 
used for the partial pre-callbratlon study {Cook ec al. , 1985), placed the item 
parameter estimates for editions used iri the equatlngs on the same scale. 
Reference to Figure 3 will be useful in understanding the description that 
follows. In the previous study, SAT-verbal old edition B7 was ealibrated in two 
separate LOGIST runs, run 2 (which also contained new edition 17) and run 4; 
parameter estimates obtained in both runs were then plaeed on the scale defined 
by run 1. n^e same sort of situation existed for r. Ld edition C5 In that it was 
calibrated separately in runs 2 and 3. After parameter estimates for test 
editions calibriited in runs 2, 3 and 4 were placed on the scale of run 1, new 
edition 17 (calibrated in run 2) was equated to old editions C5 and B7 
(calibrated In runs 3 and 4, respectively). 

One was to assess whether or not parameter estimates for the separate 
calibration runs shown in Figure 3 were adequately placed on their respective 
verbal or mathematical run 1 scales is to compare the transformed parameter 
estimates for, say, verbal edition C5 as it appears in runs 2 and 3. ^ That is, 
compare the transformed parameter estimates for this edition resulting from the 
specific transforations carried out for the partial pre -calibration study. To 
make comparisons such as the one Just described, a series of additional item 
parameter transformations and test equatlngs were carried out. 

First, referring to the SAT-verbal part of Figure 3, the characteristic 
curve transformation procedure (Stocking • and Lord, 1983), the same procedure that 
was used for the partial pre«calibratlon study to place runs 2-4 on the scale of 
run 1, was used to place runs 3 and 4 directly on the scale of run 2, For this 
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experimental transformation, all 85 Items in edition 17 (or C5) ware used as the 
linking teat. Next, the linear parameters derived from the tranii formation of B7 
item parameter estimates in run 4 the soale of hi parameter estimates in run 2 
were examined, along with the linear parameters obtained from the transformation 
of item parameter estimates for edition C5 in run 3 to the scale of Item 
parameter estimates for edition C5 appearing in run 2. If the item parameter 
estimates of the respective test editions were on scale together as a result of 
the partial pre-eallbration study transformations, the linear parameters of the 
transformations obtained by this "direct link" approach should be very close to 
those of a 45^ line, i.e. , a line with a slope of one and an Intereept of zero. 

Similar transformations were earried out to Investigate the partial 
pre-eallbration results for the selected SAT-mathematlcal editions described In 
Figure 3, All special transformations carried out for this portion of the study 
are summarized In Table 1, 

Insert Table 1 about here 



In addition to the transformations described above, special equatlngs were 
carried out to gather additional Information regarding whether or not the 
transformations used for the partial pte-callbration study succeeded In placing 
item parameter estimates for the separate calibration runs on their respective 
verbal or mathematical run 1 scales. Referring again to the verbal portion of 
the calibration scheme depicted in Figure 3, test editions B7 and G5 were equated 
to themselves using one set of Item parameter estimates that were the result of 
the previously described direct link transformations and a second set of item 
parameter estimates that were the result of the transformations earried out for 
the partial pre -calibration study. For example, edition B7 appearing in run 4 
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that had been placed direetly on the scale of run 2 (parameter estimates 
resulting from the direct link transformation; subsequently referred to as run 
4*) was equated to edition B7 appearing in run 4 (parameter estimates resulting 
from the transformations carried out from the partial pre-callbration study). 
Similar equatings to those carried out for the verbal editions were carried out 
for mathematical editions G3 and E3. These equatings are referred to in Table 1 
as special equatings 1. 

The results of special equatings 1 were interpreted In the following manner. 
If the transformations carried out for the partial pre-callbration study resulted 
in placing the calibration runs on their respective verbal or mathematical run 1 
scales, special equatings 1 should provide equating relationships represented by 
a 45** line* Any deviatiou from the expected results of equating a test to Itself 
were interpreted as indicating a problem. with the item parameter transformations 
developed in the partial pre -calibration study. 

The final set of equatings that were carried out to eKamlne the 
transformations resulting from the partial pre » calibration study are referred to 
In Table 1 as direct link equatings. These equatings did not involve equating an 
edition to Itself, but rather involved equating a new edition to an old edition 
of the test (e.g., verbal edition E7 to edition G5) , For the new editions of the 
test, the equatings used run 2 item parameter estimates that were a result of the 
transformati©ns carried out for the partial pre -calibration study. Transformed 
parameter estimates for the old test editions used in these direct link 
equatings were obtained by tha transformations described previously, which used 
the entire 85 item verbal or 60 item math test to place item parameter estimates 
directly on the respective run 2 scales . The direct link equating results were 
then compared to the equating results derived using the concurrent calibrations 
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and partial pra»calibratlon study tvansformations of the same new and old 
editions . 

The following example should clarify how the direct link equatlngs were 
carried out and how the results were interpreted. Ccnsider SAT»mathematlcal new 
edition F3 (run 2) and old edltiQn E3 (run 5), where the E3 parameter astimatas 
have additionally been placed on the scale of run 2 by the direct link 
transformation, i.e., E3 (run 5)*. An equating of F3 to E3 under these 
conditions (referred to as a direct link aquating) can be aomparad to the partial 
pre^calibration equating of F3 to E3 done in the Cook at al, (1985) study and to 
the Cook at al. crltarlon concurrent calibration aquating of F3 to E3. If the 
partial pre -calibration results ara the outlier, this can be taken as a further 
Indication that the linkage of E3 in run 5 t© the base scale (run 1) was 
inadequate. Equatlngs such as the one Just described for the mathematical new 
edition F3 and old edition E3 were also carried out for mathematical new edition 
F3 and old edition G3 and for verbal new edition 17, equated to old editions C5 
and B7i raspeotlvely , 

Errors of Estimation 

Another possible source of the discrepant partial pre-callbration study 
equating results, obtained for the equating of verbal edition 17 to edition C5 
and the equating of mathematical edition F3 to edition E3, is errors of 
estimation for the Itam parameters Gallbrated In the separate LOGIST runs. The 
, equatlngs daslgnatad In Table 1 as special equatlngs 2 were carried out In an 
attempt to explore the possibility of estimation errors. 

Referring again to the verbal portion of the calibration scheme depicted in 
Figure 3, test editions B7 and C5 were equated to themselves using one set of 
Item parameter estimates that were the result of the previously described direct 
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link craiisformatlons and a second set of parameter estimates resulting from the 
transformations carried out fer the partial pre- calibration study. For example, 
edition C5 appearing In run 3 that has been placed directly en the scale of run 2 
(parameter estimates resulting from the direct link transformations and 
subsequently referred to as run 3^) was equated to edition C5 appearing in run 2 
(parameter estimates resulting from partial pre-calibration study 
transformations). These equatings should result, once again, in a 45^ line, if 
the direct link transformation is viable and if errors of estimation did not 
seriously affect the parameter estimates of the items in the test edition as it 
appears in the separate calibration runs. Since the direct link transformations 
are based on a single transformation using a linking test eontalning 85 items, it 
safems reasonable to assume that any discrepancy from a 45** line obtained by the 
equatings is related to estimation errors in the two calibration runs of 
Interest, Special equatings 2 were carried out for the two verbal and two 
mathematical equatings investigated in this study. 

Differences In Ability T.^-^m ls of Sampler 
As mentioned previously, Stocking and Elgnor (1986) demonstrated the effect 
that the ability levels of samples used in three -parameter logistic model 
ealibrations can have on subsequent IRT equating results. The results of the 
Stocking and Eignor (1986) study may be of relevance in explaining the Cook et 
al. (1985) poor partial pre -calibration results. If, for example, the ability 
levels of the groups used in calibration runs 2-4, for the v'arbal test editions 
Identified in Figure 3, are widely -V?sparate from the ability level of the group 
in calibration run 1, then these differences may be large enough to cause 
problems for the calibration procedures used. For SAT-mathematical, an 
additional relevant comparison would Involve comparing the ability level of the 
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group in ealibratlon run 5 to chat of run 4. Raw score means and standard 
daviaclons on the common item linking sections batwean the calibration runs 
identified in Figure 3 were used to provide an Indication of possible dlffarencas 
in ability levels of the samples. 

Differential Item Fu neelonlnp in Linking Tests 
As mentioned earlier, Cook et al. (1987) demonstrated the effect on IRT 
squatlngs of contamination of linking item sets through che presence of a few 
linking items that functioned differently (DIP items) far the two groups used to 
provide data for equating and calibration. In the Cook at al. (1985) study, the 
presence of DIF items in the common item linking sections could have affected 
equating results for the partial pre- calibration equatings as well as for the 
criterion concurrent calibration equatings. If DIF Items were present and did 
have an effect on partial pre-calibration results, one would suspect more such 
items, or items exhibiting extreme differences, for the SAT- verbal and 
mathematical partial pre- calibration equatings that provided inferior results, 
i.e., E7 to CS for verbal and F3 to E3 for mathematical. 

For the partial pre- calibration runs, two separate sets of parameter 
estimates exist for each linking item from each of the separate calibrations. 
Plots of the item characteristic curves (with parameter estimates on a common 
scale) from the separate calibrations were obtained. In addition, as a measure 
of the discrepancy between the Item characteristic curves estimated in the 
separate calibrations for each common item, a weighted mean absolute dlfferenee 
(MAD) value was calculated. Using all Individuals In the larger of the two 
samples taking each linking Item, the absolute difference between the two Item 
response functions for each person (I.e., value of f) was obtained and then 
averaged over individualn . 
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Referrini to Firure 3, the following linking aections were studied using the 
above described plot and index. For SAT-verbal, the common item sections linking 
runs 2-4 to run j. (gw linking 2 to 1, gs linking 3 to 1, and gw linking 4 to 1) 
were atudled. For SAT^mathematical , the common item sect:Luii^ linking runs 2-4 to 
run 1 (gh linking 2 to 1, gh linking 3 to 1, hf + gt linking 4 to 1) and run 5 to 
run 4 (gj) were studied. It should be noted that hf + gt constitutes a pooled 
linking section containing twice the number of items (50) than is contained in 
tha usual SAT-mathamatlcal equating or coimnon item section. Reasons for using a 
pooled litaeing section to link these runs can be found in Cook at al, (1985). 

RESULTS 

ExTDlora tion of Tr ansformation Runs 
Item P arameter Estimate Transformattnng 

Table 2 contains the linear parameters obtained from the previously 
dascribad direct lirfs transformations. The two varbal transformations consisted 
of placing item parameter estimates obtained in calibration runs 3 and 4 on the 
scale of run 2 (see Figure 3) using the 85 items contained In either edition C5 
or edition B7 as the litOcing test. Similarly, tha transformations carried out 
for the two SAT -mathematical editions consisted of placing tha parraatar 
estimates obtained in calibration runs 3 and 5 on tha scale of run 2 using the 60 
item E3 and C3 editions as linking tests. 

Insert Table 2 about here 



The information provided in Table 2 Indicates that verbal edltlnn 17 and 
mathematical edition 03* appearing in runs 4 and 3 respectively, were very nearly 
on the somm scale as thase same editions appearing in the verbal and mathematical 
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calibration runs 2; i. a., th. slopes and Intsroepts of tha linaar transformations 
are close to one and zero. On the other hand, linear parameters obtained for the 
transformation runs that placed verbal edition C5 and mathematical edition 13 
directly on the scales of the verbal and math run 2 calibrations Indicate that 
these editions were not adequately placed on their respective run 1 scales by the 
partial pre-callbratlon study transformations. This information leads one to the 
conclusion that the transformations carried out for the partial pre-callbratlon 
study, designed to place the item parameter estimates for verbal calibration run 
3 and mathematical calibration run 5 on the scale of their respective run 1 
calibrations, were not successful. 

Figure 5 contains difference plots for the two types of special equatlngs 
that involved equating a test edition to itself; these were described earlier in 
the text and In Table 1. Only thj results of special equatlngs 1 are relevant 
for the present discussion. Special equatlngs 1 Involve equating an old edition 
of SAT-verbal or SAT-mathematical to itself using item parameter estimates that 
are the result of the transformations carried out for the partial pre -calibration 
study and parameter estimates that are a result of the direct link 
transformations. The difference plots contain discrepancies (in raw score units) 
between special equating results and the Identity transformation (special 
•quatlng results minus identity transformation) for the full range of possible 



raw scores , 



Insert Figure 5 about hare 



The difference plots contained in Figure 5 for special equatlngs 1 are 
designed to assess how well the partial pre- calibration study transformations 
placed Item parameter estimates for the editions used in the respective equatlngs 
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on the same seal*. Theaa plots show very diffirant resulta for the old editions 
Involved in problematic partial pre-calibi-atlon equatlnga <C5 for SAT-verbal. E3 
for SAT-matK-jmatical) than for the old aditlons involved in ths non-problgmatic 
partial pre-callbration equatlnss <B7 for SAT-verbal, C3 for SAT-nathematleal) . 
As can be seen- from examination of these plots, equating editions CS and E3 to 
themselves resulted In fairly large residuals when compared to the identity 
transformation. In contrast, residuals obtained from equating editions B7 and C3 
to themselves were quite small. The plocs pro-tde a clear indication (as did the 
prtoviously described transformation runs) that the editions used in the 
problematic partial pre-callbration equattinis were not adequately placed on their 
respective r%m 1 scales by the transformations that were carried out for that 



Figure 6 contains difference plots for the final set of equatlnis summarized 
In Table 1, referred to as direct lin.. ■quatinss. The four equatings shown In 
Figure 6 represent SAT-verbal new edition E7 equated to old editions C5 and B7 
and SAT-mathematlcal new edition F3 equated to old editions C3 and E3. Recall,' 
parameter estimates for the direct link equatings were placed on scale by a 
single transformation using the respective 85 item verbal or 60 Item mathematical 
test edition as the linking test. The direct link equatings are compared to 
equatings obtained using parameter estlinates placed on scale by transformations 
carried out for the partial pre-calibraCion study and also to the criterion 
concuirent calibraCion equatings. 

Insert Figure 6 about here 

Examination of the dlffp -enca plots shown in Figure 6 reveals that the 
equatings based on the direct link transformations agree very closely with the 
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cri-w-.rion concurrent calibrarlon equatinga. The oucUar equatings are clearly 
the SAT-verbal 17 to CS equating and the SAT-machemaClcal F3 to E3 equating based 
on the partial pre-oallbraclon study cransformatlons . Theaa results not cmly " 
confirm previous evidence that the transformations for the partial 
pre-eallbration study failed to place verbal editions E7 and C5 and mathematical 
editions F3 and E3 on scale tog.char, they also (by their close agreement with 
the equatlngs ba^^ad on the concurrent calibrations) substantiate the use of the 
concurrent calibration equatlngs as criterion equatlngs for the study. 

Errors of Estimation 

AS mentioned previously, one possible source of the discrepant results 
obtained by equating verbal edition B7 to C5 and mathematical edition F3 to 
edition E3 might be errors of estimation that occurred during item calibration. 
To explore this possibility, special equatlngs 2 (see Table 1) were carried out. 
Recall, special equatlngs 2 involved equating a test edition to Itself using one 
set of item parameter estimates that were the result of the direct link 
transformations and a second set of parameter estimates resulting from the 
transformations carried out for the partial pre-callbration study. These 
equatlngs shpuld result in a 45 » line, if the direct link transformations ^re 
viable and if errors of estimation did not seriously effect the parameter 
estimates of the items In the test edition as it appears in the separate 
calibration runs . 

Figure 5 contains difference plots resulting from equating a test to itself 
for each of the old editions used in this study- verbal editions C5 and B7 and 
mathematical editions C3 and E3 . The results of Interest for this discussion are 
those showing a comparison of special equatlngs 2 to the zero residual line. As 
can be seen from an inspection of the plots, the residuals resultiiig from special 
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equatlngs 2 are vary small, Indicating close agraemant batween the itftm parameter 
estlmatas obtained for the reapectlva test edltiona as they appeared In the 
separate calibration runs. The reaults of special equatlnga 2 can.be Interpreted 
as Indicating that estimation error Is not a plausible explanation for the poor 
equatlngs obtained for editions 17 to C5 and F3 to E3 In the partial 
pre- calibration study. 

Ability l.m^^ \a of SamplBs 
One possible explanation for the poor reaults obtained for the 
transformations carried out for the partial pre- calibration study might be 
differences in the ability levels of samples used to calibrate linking ll 
administered with the new and old test edltiona. Table 3 presents sumn 
on perfomance on equating sections used to provide the links between adjacent 
calibration runs In the sections of the SAT-verbal and SAT-mathematlcal partial 
pre-callbration linkage systems being studl.d. Means and standard deviations are 
reasonably similar on the equating sections for groups used in the separate 
calibrations, with one notable exception. Mean performance on SAT-verbal 
equating section gw is quite different for the groups Involved in calibration 
runs 1 and 4- -about a third of a standard deviation different. In the Stocking 
and Elgnor (1986) study, at around this level of difference in ability the 
researchers began to note some small differences In equating results due to 
differences In the precision with which the parameters were estimated. Hence, 
the Stocking and Elgnor resulta could prove useful In explaining the poor Cook et 
al. (1985) partial pre-callbration results except for the fact that equating 
section gw, connecting calibration runs 1 and 4, provides the link that places 
old edition 17 on the base scale (run 1). As seen in Figure 4, the partial 
pre-callbration equating of new SAT-verbal edition E7 to old edition B7 provided 
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excellent results whan compared to tha concurrent criterion. For the linking 
sections Involvad in the inadequate partial pre- calibrations (gs linking runs 1 
and 3 for SAT-verbal, hf + gt linking runs 1 and 4 and gj llnklns runs 4 and 5 
for SAT-ma thamatlcal) , ability levels of the two groups used In the calibrations 
and subsequant Unkings were reasonably similar. In sum, it would appear that 
dlffarenees In the ability levels of the samples used in calibration and linking 
is not a major contributing factor to the poor partial pre -calibration aquatlng 
results under study. 

Insert Table 3 about herr 

Differential Item Funetlonlnp In Linking Test^ 
Since differential Item functioning (DIF) was found to be a major factor In 
the adequacy of transformations carried out for the study by Cook, Eignor and 
Wtngerslqr <1987) , the presence of DIF In the common Item linking sections was 
studied fflr the partial pre-callbratlon equatlngs . For this aspect of the study, 
major emphasis was placed on the weighted mean absolute difference (MAD) Index in 
deciding on which items exhibited DIF to a degree that removal from the linking 
section seemed reasonable. 

Figures 7 and 8 contain ordered stem and leaf diagrams of mean absolute 
differences (MAD) between Item response functions for the SAT-verbal and 
SAT-mathematical equating sections after applicacion of the partial 
pre-calibratlon study transformations. Careful consideration of the 
distributions of these MAD values, In conjunction with plots of the item 
characteristic curves derived from the two calibrations for each linking Item, 
led to the decision that a MAD value greater than .035 wculd provide ar. 
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Indieatlon of DIF. This cut-off is represented by the dotted line in the stem 
and leaf diagrams in Figures 7 and 8 . 



Insert Figures 7 and S about here 

Uaa of the .035 cuc-off as an indication of DIF led to the identification of 
a amall number of SAT-verbal and SAT-mathematical items that should possibly be 
removed from the partial pre'calibration linking sections. The item response 
functions for these Items (on the same scale) , derived from the separate 
calibrations, are presented In Figui^es 9 and 10. Consistent with eKpeetations , 
sections linking editions eKhlblting poor partial pre-calibratlon equating 
results Cgs linking runs 1 and 3 for SAT-verbal, hf+gt linking runs 1 and 4 for 
SAT =mathemat leal) contained a greater number of DIF items than did sections 
linking edltlans that provided acceptable partial pre -calibration equating 
results . 

Insert Flgurss 9 and 10 about here 

The study of the presence of DIF items in the concurrent eallbrations proved 
to be a difficult task in that only item parameter estimates based on the 
combined group of examinees responding to the linking items ware available (see 
Figure 1). Thus, a siumnary Index, such as HAD, could not be calculated. Because 
no proeedure that paralleled the one employed to remove DIF Items from the 
linking tests used for the partial pre-callbration transformation runs could be 
applied to the concurrent calibration linking sections, a decision was made not 
to rerun any of the criterion concurrent calibrations with items removed. The 
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impllcacions of this decision will be diacusaod further In the conclusion 



section. 



Partial Pre-eallbra tlon Equaeinga wlth DIF Itama _Renioved 
Items in the linking sections connecclng partial pre- calib ration equating 
calibration runs having MAD values greater than the .035 cut-off were removed 
from the linking sections and the characteristic curve transformation runs were 
redone. Figure 11 presents difference plots for the partial pre-callbratlon 
equatlngs with DIF items removed (referred to as current partial pra-callbratlon 
equatlngs) along with the previous partial pre-calibratlon results and the direct 
link equating results. The criterion in these plots is again the concurrent 
calibration equating results . 

Insert Figure 11 about here 

For the poor partial pre-calibracion equatlngs (E? to CS for SAT-verbal, P3 
to E3 for SAT-mathematical) , removal of DIF items resulted in modest reductions 
In raw score differences when compared to the criterion concurrent calibration 
equating results. In other words, the current partial pre-aalibration equating 
results provide only a slight improvement over previous results that were 
considered problematic. For the acceptable partial pre-calibratlon equatlngs 
from the Cook et al. (1985) study (E7 to B7 for SAT-verbal, F3 to C3 for SAT- 
mathematical), removal of DIF items did not Improve the partial pre-callbratlon 
re.sults much at all and In some places on the score scale, differences from the 
criterion concurrent calibration equatlngs were increased slightly. 
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CONCLUSION 

Tha purpose of this study waa to Investigate factors that may have lad to * 
poor partial pra-calibration equating results in a previous study (Cook et al. , 
1985) and to attempt to improve on the poor partial pre -calibration results. 
First, a series of item parameter transformations and equatings were carried out 
in. an attempt to determine if the poor results from r^^e previous study were 
related to the item parameter transf ormatiens or calibrations. Convinced that 
the problem equatings were a result ©f the transformations, the authors then 
focused on 1) possible differences In the ability levels of the samples used to 
calibrate the items used for linking purposes, and 2) the possible existence of 
differential item functioning of the linking items for the samples used to 
calibrate the data. 

Examination of si^mary performance data for samples taking sections linking 
editions esdiibitlng poor partial pre»callbration results led to the conclusion 
that ability differences were not a major contributor to poor equating results. 
Removal of DIF Items from sections linking editions exhibiting poor partial 
pre -calibration equating results led to only modest Improvements in these results 
when compared to the concurrent calibration criterion equatings. These results 
seem to run counter to those observed by Cook, Elgnor, and Wingersky (1987) when 
these researchers simulated jDIF items in common item linking sections. However, 
the DIF Items used in the Cook, Elgnor, and Wingersky study exhibited much 
greater differences In item parameter estimates (and, hence, much larger MAD 
values) than the Items assumed to be demonstrating DIF in this study. In sum, 
either or both of these factors do not appear to be the sole contributors to the 
poor partial pre-calibratlon equating results in the Cook at al. (1985) study. 
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One critieism of this study lies in the fact that no Items potentially 
exhibiting DIF were removad from the concurrent calibratlona and subsequent 
equating results. Had sueh items been located arid removed, differences between 
the partial pre -calibration equating results (with DIF items removed) and 
concurrent calibration equating results may have decreased more. However, it 
should be recalled that the direct link equatings, based ©n entirely different 
linking tests, agreed elosely with the concurrent calibration equatings. Thus, 
It saems reasonable to assume that new and old editions were placed on seale 
properly by the concurrenc calibrations (in spice of any DIF items that might 
have been present) and that equating diserepancles from this criterion indicate 
poor results because the partial pre » calibration transformations carried out with 
and without removal of DIP items did not result in the editions being on scale 
together , 

The results of this study, which indicate that removal of DIF items from a 
linking or equating test do not substantially improve equating results, are 
difficult to aecapt* It has long been common practice to Inspect Items for DIP 
and to remove those exhibiting substantial differences when carrying out 
conventional equating procedures. Cook and Petersen (in press) summarise 
research conducted to investigate properties of linking items and how these 
properties affect equating results. They conclude , using research by a variety 
of Investigators, that one must be very careful that linking tests represent, as 
much ae possible, identical tasks for groups of eKaminees who take the new and 
old editions of a test to be equated, i,e,, the presence of differential Item 
functioning can have a serious effect on equating results, 

A question of interest for the present study then is, why didn't removal of 
the items eKhibiting DIF have a substantial effect on the equating results? 
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There are a number of possible explanations that can he offered for the results. 
First, perhaps the procedures used to detect DIF were Inadequate. Recall, DIF 
was determined by the use of the index and corroborated by visual inspection 

of plots of item response functions. It is possible that a non-lRT approach* 
such as the use of the Mantel -Haenssel (Holland and Thayer, 1986) or 
standardization (Dorans and Kullck, 1986) statistics, may provide a better means 
of Identifying DIF items. 

Secondly, the study carried out by Cook, Eignor and Wlngersl^ (1987) 
examined the effect of including two items each exhibiting a substantial amount 
of DIF in m single direction (i,e,, both items biased against the same group) m 
the linking test. Perhaps a more likely occurrence would be a small number of 
Items exhibiting DIF, but in opposite directions, thus having a neutralizing 
affect on each other. Or, a typical situation may be a fairly large number of 
items, each eKhlblting small but consistent DIF in a single direction, and hence 
having a etimulative effect on the transformation and, ultimately, the equating 
results. It is possible that either of these two situations existed for linking 
tests used for the partial pre-callbration study, but the procedures decided upon 
detect DIF were not adequate to Isolate such occurrences . 

To sunmiarize, the results of the specific investigations carried out in the 
present study did not provide an explanation for the poor equating results 
obtained in the partial pre -calibration study. This is particularly 
disappointing since the results of two simulation studies (Cook, Eignor and 
Wlngersl^, 1987; Eignor and itocklng, 1986) strongly indicated that the 
problematie results could be related to either differences in ability levels of 
samples used to calibrate the linking tests or the presence of DIF Items in the 
linking tests or both. 
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The tasult^^ of the study dc provide an Indication that insde^vata 
partial pta-e^a^bration equating results from the Cook et al. (1985) study ware 
ralatad to thft Mact that editions used in tha problematiQ eq^itlngs <^»r8 not on 
scale togathe^ a result of application of the eharac car la tic curve 

transformatton procedure. It is apparent that jiome other f tor or f&etori 
basldas ability level differeneas and the prasenee of DIP Ita^imust We 
influencing th^ssa transformations. Given that the charaetarl^tU m^^m 
transformation poroeedura is a frequently used proeedura by lR'3pmetit^±onirs 
involved 1^ equa mtlng and differential Item functioning studies, this i ^ elearly 
an area where ^U:«rther raseareh should be emphasised. 
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New Equatiiig 
EdltioTi Teat A 



Equating Old Edition Edition 
Test B A B 



Sanpla 1 X X 

Saapla 2 X x 

Sainpla 3 x 



Figure 1: Coneurrmnt calibration designs for SAT- verbal and 
SAT-mathematiaal. (The "Xs" indicate t-ln^ tests 
taken hy the respeetive samples.) 




#iition 
equacL^g s set Ion 
old 0^ition 



FtgUifa 2: Depletion of equating relationships among the 

$AT-vgrbal and SAT-mathematical sditlons chosen for 
further itudy. 
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SAT-varbal 





Figure 3: Pprtioni of SAT-varbal and SAT -mathsmat leal partial pre-ealibration 

traniforffiati»m plans eontaining spaclfie editions undsr investigation 
in thiiitud^. Upper Mse letters and n^bars dssignats operational 
aditioni; lo^mv case lattars dasignata equating sections. Parametar 
astlmatss for the partial pre -calibration equatings cma from the 
editions thafc «a circlad. Niimbars identify spaetfic calibration 
runs , 
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Figure 4 

Plots of Kaw=Score Equating Differences Derived from a Comparison of 
Partial Pre-calibration Equating Results to Concurrent 
Equating Results for SAT-verbal and SAT-ma themat leal 
Editions Being Studied" 
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Partial pre-eallbration and concurrent equating results taken from Cook et al. 
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C1985) study. 



Figure 5 



Plots of Raw-Scora Equating Differences Derived from a Comparison 
of Special Equating Rasults to the Identity Transformation 
for SAT-verbal and SAT-mathematical Old Form 
Editions Being Studied 
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Special equatings are daf ined in Table 1 (see Equating 1 and Iquating 2) , 
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Figure 6 



Plots of Raw-Score Equating Dlfferanees Darived from a Comparison 
of Partial Pre-calibratipn Equating Results and Diract Link 
Equating Rasults to Concurrent Equating Results for SAT-varbal and 
SAT-mathaMtieal Editions Baine Studiad 
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Figure 7 

Stem and Leaf Dlagranis of M«an Absolute Dlfferancoa (MAD) Batwaon 
Item Responsa Funct:ions for SAT-varbal EquaClni Sactions 
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item and Leaf Dlasr^s e£ Hean ^^qlufts Dicf^inqes (HAD) Bafcwsen 
Item RaspensB Fuactions fee iAT-mafihemafeieanquatiiig Se^^^ieiis 
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Figure 9 

Plots of Item Response FunctioM for Items Removed from 
SAT--verbal Iquating Sections 
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Plot^ of Item Responsa Functions for Items Removed from 
SAT-mathematical Equating Sections 
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Figure 10 (continued) 



Plots of It am Responsa Funetions for Items Removad from 
SAT--Mthamat±cal Equating Sectioiis 



Equating Sections hf + gt - Linking Runs 1 and 4 
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Jlgure 11 



Plots of Raw-Seore Equating Dlfferencas Darlved from a Comparison of Previous 
Partial Pre-calibratlon, Currant Partial Pre-callbration and Direct 
Link Equating Rasults to Concurrent Equating Results for SAT- 
varbal and SAT-mathematieal Editions Baing Studied 
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— PMIVIOUS PARTIAL PRE-CALIBEATION 

CURRENT PARTIAL PRE-CALIBRATION 

' DIRECT LINK 

— — CONCURRENT CRITERION 



Pravious partial pra-ealibratlon results were 'taken from Cook et al* (1985) equatings 
Current partial pra-callbration results' Involve same editions and Unkings as pravious 
results except that items aKhibiting DIF have been removed from comnon item linking 
aeetions. Direct link equatings are described In Table 1. 
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tshXm 1 



Si^esEy e£ Spscial E. aatings Ftrfaeraed to Study 
Listing Ffebl^s end Estifflaftisn IsrdEs 



Old iditlsn 



_SA7-Verbal 



£5 Cnm 35 plae^d 
on sgsle of f\m 2 
using C5 items 



3 



17 Crun 4) placed 
OR ssala of run 2 
using B7 items, 
i.e. I 17 ^)* 



_gAT_^MathBfnatieal 



C3 



C3 CSUT} 3) plaoed 
€n sqala o£ rtm 1 
using C3 items, 
i.e. , C3 Crun 3)* 



13 



13 C^un 53 placed 
on ssele of sun 2 
using 13 it^s. 
i.e. , 13 Ci™ 5)* 



Faramster estimates 
for Iquating 1 



C5 Crtm 33* 
Cr^ 35 



to 



17 Crisi 43* to 
B7 Cr\m 4 5 



C3 35* to 

C3 Crun 3} 



13 Cr\m 55* to 
13 C^tm 5) 



Farisietsr estimatvs 
for Iquating 2 



eS Cr%m 35* to 
Q5 (mm 25 



17 imm 4 5« 
B7 Crim 25 



C3 Cms 3)* to 
C3 imm 2} 



13 Ci™ S)* to 
13 Crtm 2) 



Far^star estimates 
for Diss€t Link 
Iquating 



E7 Crtm 2) to 
Q5 (3nm 3)* 



17 isvm 2} to 
17 (rim ^5* 



F3 Crui 25 to 
£3 Cs™ 35* 



F3 Crmi 2) to 
13 Crtm 55* 



Parameter esfeimatas for all editions used in these squatings have already been placed on the base 
agale (rai 1 in Figure 3 5 as part of the CqqH efe al. (lgS55 study. 

mm rtm, Identified in Figure 3, from t^ieh the parametas estimates were taksn is Identified in 
parentbeaeSf i.e., Q5 paramefesr estisatas ftos rtm 3 in Figure 3. 

3- 

The asterisk Indieatea that the parameter estimates have been transformed to the seala a£ a 
different ealibration rtm identified in Figtire 3. 
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Tabia 2 

Linear Parameters Obtained from Direct Link 
Item Parameter Transformations 





Test 


Mew Edition 


Base Editton 


Common Items 


__Llnear 
Glooe 


.Parameters 


SAT 


-verbal 


B7 (run 4) 


B7 (run 2) 


B7 


,9996 


Intereept 
.0183 


SAT 


-verbal 


05 (run 3) 


G5 (run 2) 


05 


1.0931 


,0373 


SAT- 


^mathematieal 


E3 (run 5) 


13 (rim 2) 


E3 


,9042 


- .0278 


SAT^ 


^mathematleal 


C3 (run 3) 


C3 (run 2) 


C3 


1.0073 


.0148 
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Tab la 3 



Equating Section Summary Data for Adjaeent 
SAT-verbal and SAT-mathomatleal Calibrations 



SAT -verbal 

_ _ „, ^ . Equating Sftf.tTnn EauatinP Rsrfinn 

Equacing Callbtfjtlon Calibration ~^ 

Miss Run Mean S.D. 

S» 1 16.53 7.80 2 16.87 8.07 

IS 1 16.11 7.82 3 16.08 7.95 

S» 1 16.53 7.80 4 14.07 7.90 



SAT-mathamatleal 

1*,- - BquatlnF 'laction Bouatlng n^nfle^^ 
Equating Callbirjtion Calibration 

R"" Mean S^ Run Mean s.D. 

gh 1 8.85 5.54 2 8.41 5.75 



1 8.85 5.54 3 



8.40 5.51 



M+gt 1 19.68 2 ^ 20.01 2 



Sj 4 9.91 6.11 5 



9.23 6.14 



^Refers to gpeelfle eallbratlon run identified in Figure 3. 
2 

Could not be ealoulated from available data. 
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Appandlx 
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Partial Pre'calibr atlon TrangfermattQn Plan/Linkage Svgtem 

^ alaborace linkage sy^tsm was deviled in the Cook et al. (1985) study to 
allow placement: of item parameter astlmates on a common acala so that IRT 
equating rasulting from a partial pra-calibratlon design could be investlgaced. 

Figuraa A-1 and A- 2 illustrate tha design usad for tha verbal and 
mathematical sections, respectlvaly . Each figure dapiets the linkages nacessa3^ 
t© place nine naw editions, fifteen old editions, and assoelated equating tests 
on tha base soala. It should ba noted that upper ease letter and number 
combinations Indicate operational sections of the SAT, lower oase letters 
indicate equating seotlons, boxes with solid lines anclosa old tast editions, and 
boxes with dotted lines Indleata new test adltlons. 

Lower case letter cQmbinatlons , or occasionally, upper case letter and 
numbar combinations indlcatad above the arrows in Figure A-1 and A-2, denote 
common items that wera used to plaoe item parameter estimates from separate 
oallbrations on a eommon scale via tha characteristic curve transformation 
procedure (Stocking and Lord, 1983), 

For SAT-verbal, the eallbratlon run containing new edition ES and eight 
equating saetions , whieh were administered In November 1982, was used as the base 
item parametar scale to which all other sets of item parametar estimates were 
scaled. For axample, item paramater astlmates on a common scale exist for naw 
verbal adition E7 and equating test gw from a pravlous eoncurrent calibration run 
(whieh Is fully depleted in Figure 3 in the tent) . Item parameter astimatas for 
gw also axlsted from the calibration run used as the base scale. The Itam 
paramater estimates for gw that ware on scale with those for verbal adltlon 17 
were scaled to those that are on tha base E8 scale (using the characteristic 
curve transformation method) , The rasulting transformation was then applied to 
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the Item paraaetar esclmaces for verbal adltlon E7 to convert them to the verbal 
base scale. The Itam parainetar escimates for each new and old edition listad in 
Figure 2 were converted to the base scale In a similar manner to that Jusc 
described for verbal edition 17. For some editions , more than one scaling was 
required to eonvart the item paraaater astimatas to the base scale. For example, 
item parametar estimates for edition Fl were converted to the sane seaie as those 
of edition E9 through equating test hy, which was then converted to the base 
scale through equating test gq. Item parameter estimates for new editions F5 and 
F6 were placed on scale in several different ways in the Cook at al. <1985) 
study. Those for verbal edition F5 were placed on scale using items from either 
equating section he or equating section gs or from the pooled he and gs equating 
sections. In addition, new editions FS and F6 were used to study the affect of 
both full pre -calibration and pre-aquatlng in the following way. Approximately 
50% of the Items contained in both editions were placed on the base scale as 
pretest items administerad with test editions Dfi, E5, and E6. The remaining 50% 
of the items were placed on scale when they appeared In the final editions of F5 
and F6, which were calibrated at their respective Initial administrations. Item 
parametar estimates from these different calibrations were assembled (after they 
had been placed on the 18 scale) Into pre-oallbrated editions F5 and F6. The 
editions were then equated (simulating pre-equatlng) to their respective old 
editions . 

The linking plan for SAT-mathematloal (depicted in Figure A-2) is virtually 
Identical to the plan previously described for SAT verbal. The only difference 
between Figures A-1 and A-2 are the lower case letters used to designate the 
equating sect Ions. 
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hppmv emam leteeM «d nrabers dMlgnate apeMeional editions j lowe^ M«e leteer. desimate .quatlng Mctlons. 

2 - - 

Detted lines imdlcate atw adltldas. gelid lines indieate eld edi£i©as. 

^fdition 73 was placed ©a tha IS scale three waysi (1) uiins itess eentaiaed in equatlag seetlca h6| 
<2) uslag itftms coatalned ia equatlag seetiea gs| and (3) using Items coatained ia pooled equatiag 
sacclons he and gs« r -i e 
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Figure A. 2 
SAT-Hatheaatieal Traniformatioa Plan^'^ 



I ES,grig£«hn, 
i gt,hl,gh,gx 




F4p^ 



F5 and F6 preiesc It^s ^^-^ 



^Jpper wm ImttM and numbers deiignaea eperaiional ed±Elans| lower sasa letters designate #qust±ng seetlons. 
2 

Doe tad Itaei IndlcaEe new editions | solid lines indicate old editions* 

3 _ 

Edition FS was plag#d on th# E8 soale £hr#e waysi CD Wiing items contained in equating section h£i 
ieotions^hf and ^^^^^^ng seotioa gt| and (3) using ittmg oontained in pooled equating 
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